Thomas is Senior Data Scientist at Pharmalex. He is passionate about the incredible possibility that blockchain technology offers to make the world a better place. You can contact him on Linkedin or Twitter.
Milana is Data Scientist at Pharmalex. She is passionate about the power of analytical tools to discover the truth about the world around us and guide decision making. You can contact her on Linkedin.
What is the Blockchain: A blockchain is a growing list of records, called blocks, that are linked together using cryptography. It is used for recording transactions, tracking assets, and building trust between participating parties. Primarily known for Bitcoin and cryptocurrencies application, Blockchain is now used in almost all domains, including supply chain, healthcare, logistic, identity management… Some blockchains are public and can be accessed from everyone while some are private. Hundreds of blockchains exist with their own specifications and applications: Bitcoin, Ethereum, Tezos…
What is Helium: Helium is a decentralized wireless infrastructure. It is a blockchain that leverages a decentralized global network of Hotspots. A hotspot is a sort of a modem with an antenna, to provide long-range connectivity (it can reach 200 times farther than conventional Wi-Fi!) between wireless “internet of things” (IoT) devices. These devices can be environmental sensors to monitor air quality or for agricultural purpose, localisation sensors to track bike fleets,… People are incentivized to install hotspots and participate to the network by earning Helium tokens, which can be bought and sold like any other cryptocurrency. To learn more about Helium, read this excellent article.
What is R: R language is widely used among statisticians and data miners for developing data analysis software.
This is the third article on a series of articles on interaction with blockchains using R. Part I focused on some basic concepts related to blockchain, including how to read the blockchain data. Part II focused on how to track NFTs data transactions and visualise it. If you haven’t read these articles, I strongly encourage you to do so to get familiar with the tools and terminology we use in this third article: Part I and Part II.
Helium is an amazing project. Unlike traditional blockchain related project, it is not just about finance but it has real-world applications. It solves problems that exist for people outside the crypto world and that is awesome. In the past, deploying a communication infrastructure was only possible for big companies. Thanks to the blockchain, this can now done collectively by individuals.
The question we are trying to answer here are: How big is the Helium network? Where are located the hotspots? Are they useful or in other words, are they used to transfer data with connected devices? We will analyse all historical data since the first block of the blockchain, up to the latest. We will generate some statistics and put emphasis on visualisation. I believe there is nothing better than a good graph to communicate a message
To fetch the data, there are several possibilities:
When you work with big dataset, it can get (very) slow. Here are two tricks to speed-up a bit:
Work with packages/function adapted to handle large dataset. To read the data, we use here the fread from the data.table package. it is much faster than read.table and takes care of decompressing files automatically. For the data management, data.table is also much faster than tidyverse but I find the code written with the latter much easier to read. That’s why I use the tidy approach unless it struggles with the operation and in that situation, we switch to data.table.
Try to keep only the data you need to save memory. This involves discarding data we won’t use such as columns with unimportant attributes as well as deleting heavy objects when we don’t need them anymore.
A HTML version of this article as well as the code used to generate it is available on my Github.
The code below read data about the hotspots and do some data management. We use the the H3 package to convert the Uber’s H3 index into latitude/longitude. H3 is a geospatial indexing system using a hexagonal grid, with higher resolutions covering a larger area, and the smallest resolution covering centimeters of the earth. Helium uses the resolution 8. To give an idea, with this resolution, the earth is covered by 691,776,122 hexagons (see here).
# First, let's load a few useful packages
library(knitr)
library(tidyverse)
library(data.table)
library(ggplot2)
library(gganimate)
library(hexbin)
library(h3)
library(lubridate)
library(sp)
library(rworldmap)
# Run this prior to loading library(rayshader) to
# send output to RStudio Viewer rather than external X11 window
options(rgl.useNULL = TRUE,
rgl.printRglwidget = TRUE)
library(rgl)
library(rayshader)
### Retrieve info on the hotspots
# dataHotspots <- fread(file = "data/gateway_inventory_01257002.csv.gz", select = c("address", "owner", "first_timestamp", "location_hex")) %>%
# rename(hotspot = address,
# firstDate = first_timestamp) %>%
# filter(location_hex != "", # remove hotspots without location
# firstDate != as.POSIXct("1970-01-01 00:00:00", tz = "UTC")) %>% # a few hotspots appears to have been installed in 1970. This is obviously a mistake in the data base.
# mutate(data.frame(h3_to_geo(location_hex)),
# hotspot = factor(hotspot),
# firstDate = round_date(firstDate, "day"), # resolution up to the day is well enough
# owner = factor(owner)) %>% # get the centers of the given H3 indexes
# select(-location_hex)
#
# saveRDS(dataHotspots, "data/dataHotspots.rds")
dataHotspots <- readRDS("data/dataHotspots.rds")
This is how the hotspot dataset looks like. We have the address of the hotspot, the address of the owner (an owner is an Helium wallet to which several hotspots can be linked), the date the hotspot was first seen on the network and its location on the globe.
glimpse(dataHotspots)
## Rows: 605,288
## Columns: 5
## $ hotspot <fct> 112SWsX8Xq5SANFn5Lm8Jp23dyAMwQibhm1yvuLUfQs92hRfFYtL, 112Yxc~
## $ owner <fct> 12zX3uhzkhY3voaoFbaAUGh4n25k5xpSBC7pK8mVmoa6pHZdf6u, 14eFe9G~
## $ firstDate <dttm> 2022-02-13, 2021-09-09, 2021-05-21, 2021-08-27, 2021-08-11,~
## $ lat <dbl> 33.72116, 52.20844, 50.36541, 54.26865, 42.50571, 34.27130, ~
## $ lng <dbl> -84.318520, 5.957381, 19.005351, 18.632860, -94.199723, -118~
Table 1 shows a few descriptive statistics on the hotspot dataset.
dataHotspots %>%
summarise( `Date range` = paste(min(firstDate), max(firstDate), sep = " - "),
`Duration` = round(max(firstDate) - min(firstDate)),
`Total number of hotspots` = length(levels(hotspot)),
`Total number of owners` = length(levels(owner))) %>%
t() %>%
kable(caption = "Descriptive statistics on the content of the hotspot dataset.")
| Date range | 2019-07-31 - 2022-03-08 |
| Duration | 951 days |
| Total number of hotspots | 605288 |
| Total number of owners | 245938 |
The first statistics we calculate is the number of hotspot per owner. Since there are a lot of owners, showing all the combinations is not an option. Plotting an histogram of the distribution is not an option either as it is super skewed (there is an owner with about 2000 hotspots!). Therefore, we chose here to bin the number of hotspots in categories (Table 2). We see that most owners have only one hotspot but some really own a lot of hotspots.
dataHotspots %>%
group_by(owner) %>%
summarise(n = n()) %>%
mutate(`Number of hotspots per owner` = cut(n,
breaks = c(1, 2, 3, 4, 5, 9, 50, Inf),
labels = c("1", "2", "3", "4", "5-9", "10-50", ">50"),
include.lowest = TRUE)) %>%
group_by(`Number of hotspots per owner`) %>%
summarise(`Number of owners` = n()) %>%
mutate(`Proportion (%)` = round(`Number of owners`/sum(`Number of owners`)*100,2)) %>%
kable(caption = "Distribution of the hotspots across owners.")
| Number of hotspots per owner | Number of owners | Proportion (%) |
|---|---|---|
| 1 | 200729 | 81.62 |
| 2 | 16420 | 6.68 |
| 3 | 8730 | 3.55 |
| 4 | 5174 | 2.10 |
| 5-9 | 7760 | 3.16 |
| 10-50 | 6593 | 2.68 |
| >50 | 532 | 0.22 |
There are more than 500k hotspots in the world, that’s a lot. These hotspots didn’t appear in one day. In Figure 1, we visualize the growth of the network in terms of number of hotspots added to the network, using a cumulative plot. We see three phases: (1) a slow linear increase, (2) an exponential increase in the middle of 2021 followed by (3) a fast linear increase. My opinion is that the exponential increase could have continued a bit more but Hotspot supply has been limited by the world chips shortage following the Covid pandemy. To give an idea, there was 6 months between my hotspot order and its delivery.
nHotspotsPerDate <- dataHotspots %>%
group_by(firstDate) %>%
summarise(count = n())
ggplot(nHotspotsPerDate, aes(x = firstDate, y = cumsum(count))) +
geom_line() +
labs(title = "Growth of the network infrastructure",
y = "Total number of hotspots (cumulative)",
x = "Date") +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE),
breaks = seq(0, 5*10^5, length = 6))
Figure 1: Cumulative plot of the growth of the network infrastructure in terms of number of hotspots added to the network.
Since we have the location of these hotspots, we can also visualize where these hotspots are located. We start by creating an empty world map on which we overlay the hotspot data. Plotting all the individual hotspots on a map will just be too much (there are more than 500k hotspots), the data is clearer to plot and interpret once it is summarised. We chose here to bin the hotspots into hexagons using a function found on the web (function here) and we then plot them using the geom_hex ggplot2 function (Figure 2).
We see that most hotspots are located in North America, Europe and Asia, mostly in big cities. There are practically no hotspots in Africa, Russia and very little in South America. Surprisingly, we see a few hotspots in the middle of the ocean. It could be a data issue or also cheating: People found ways to increase their rewards by spoofing their hotspot’s location, sadly.
# create an empty world map
world <- map_data("world")
map <- ggplot() +
geom_map(
data = world, map = world,
aes(long, lat, map_id = region)
) +
scale_y_continuous(breaks=NULL) +
scale_x_continuous(breaks=NULL) +
theme(panel.background = element_rect(fill='white', colour='white'))
# bin the hotspot into hexagons
makeHexData <- function(df, nbins, xbnds, ybnds) {
h <- hexbin(df$lng, df$lat, nbins, xbnds = xbnds, ybnds = ybnds, IDs = TRUE)
data.frame(hcell2xy(h),
count = tapply(df$hotspot, h@cID, FUN = function(z) length(z)), # calculate the number of row as the number of transactions
cid = h@cell)
}
# find the bounds for the complete data
xbndsHotspot <- range(dataHotspots$lng)
ybndsHotspot <- range(dataHotspots$lat)
nHotspotsHexbin <- dataHotspots %>%
group_modify(~ makeHexData(.x, nbins = 500,
xbnds = xbndsHotspot,
ybnds = ybndsHotspot))
map +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nHotspotsHexbin) +
scale_fill_distiller(palette = "Spectral", trans = "log10") +
labs(title = "Hotspots localisation in the world",
fill = "Number of hotspots") +
theme(legend.position = "bottom")
Figure 2: Hotspots localisation in the world
On top of a visualisation, it is always useful to provide some numbers. Below we summaries the proportion of hotspot per continent. For this, we leverage the rworldmap package with a custom function from here which converts longitude/latitude into the continents. Table 3 shows that nearly half the hotspots are located in North America, followed by Europe with 30% and then Asia with 16%. Note the Undefined group which probably refers to hotspots in the middle of the ocean or just at the border of a continent. Note also the four hotspots in… Antartica.
# The single argument to this function, points, is a data.frame in which:
# - column 1 contains the longitude in degrees
# - column 2 contains the latitude in degrees
coords2continent = function(points)
{
countriesSP <- getMap(resolution='low')
# converting points to a SpatialPoints object setting CRS directly to that from rworldmap
pointsSP = SpatialPoints(points, proj4string=CRS(proj4string(countriesSP)))
# use 'over' to get indices of the Polygons object containing each point
indices = over(pointsSP, countriesSP)
return(data.frame(continent = indices$REGION, country = indices$ADMIN))
}
dataHotspots <- dataHotspots %>%
mutate(coords2continent(data.frame(.$lng, .$lat)),
continent = replace_na(as.character(continent), "Undefined"),
continent = factor(continent))
dataHotspots %>%
group_by(continent) %>%
summarise(count = n()) %>%
mutate(percentage = round(count/sum(count)*100,2)) %>%
arrange(desc(count)) %>%
kable(caption = "Distribution of hotspots per continent.")
| continent | count | percentage |
|---|---|---|
| North America | 281591 | 46.52 |
| Europe | 190414 | 31.46 |
| Asia | 97621 | 16.13 |
| Undefined | 18567 | 3.07 |
| South America | 9954 | 1.64 |
| Australia | 5142 | 0.85 |
| Africa | 1995 | 0.33 |
| Antarctica | 4 | 0.00 |
Now, that these hotspots exist, we would like to know if they are useful. Are they used by connected device to transfer data? How often?
To answer this question, we download all the history of data transfer. This is a huge dataset (3GB). On Helium, you only pay data you use. Every 24 bytes sent in an uplink or downlink packet cost 1 Data Credit (DC) = $0.00001. To get an idea of how much the network is used, we can look from 2 perspectives: (1) check the volume of data exchanged and (2) check how often the hotspots are used by to transfer data from connected devices.
# ### Retrieve transferred data packed data
listFilesTransactions <- list.files("data/packets", pattern=".csv.gz", recursive = T)
# We specify the columns we want to keep directly in the fread call to save memory
dataTransactions <- lapply(1:length(listFilesTransactions),function(i){
data <- fread(file = paste0("data/packets/",listFilesTransactions[i]), select = c("block", "transaction_hash", "time", "gateway", "num_dcs"))
return(data)
})
dataTransactions <- dplyr::bind_rows(dataTransactions) %>%
mutate(bytes = 24 * num_dcs, # Every 24 bytes sent in an uplink or downlink packet cost 1 DC = $.00001.
date = as.POSIXct(time, origin = "1970-01-01"),
date = round_date(date, "day"), # reduce the precision of the date to ease the plotting
gateway = factor(gateway)) %>%
select(-time, -num_dcs, -transaction_hash) %>%
rename(hotspot = gateway)
# let's combine the hotspot and packet dataset by keeping all rows X and Y
dataTransactionsWithLocation <- inner_join(dataTransactions, dataHotspots) %>%
mutate(hotspot = factor(hotspot, levels = levels(dataHotspots$hotspot))) %>% # this is to avoid dropping levels for hotspots not involved in any transaction
select(-owner, -firstDate)
# let's remove these two big dataset to save memory
rm("dataHotspots")
rm("dataTransactions")
saveRDS(dataTransactionsWithLocation, "data/dataTransactionsWithLocation.rds")
dataTransactionsWithLocation <- readRDS("data/dataTransactionsWithLocation.rds")
This is how the transaction dataset looks like. For each transaction, we have the block number, the address of the hotspot, the number of bytes transferred, the date, and the location of the hotspot.
glimpse(dataTransactionsWithLocation)
## Rows: 74,589,650
## Columns: 8
## $ block <int> 333619, 333669, 333958, 333958, 333958, 333958, 333958, 3339~
## $ hotspot <fct> 11tkAbgqHU2qU7GTiuwjggEDaYsmRDsbPsJjw5ezsu54coQE7Cu, 112DCTV~
## $ bytes <dbl> 24, 120, 216, 264, 4296, 48, 1920, 432, 2088, 29760, 2112, 4~
## $ date <dttm> 2020-05-15, 2020-05-15, 2020-05-15, 2020-05-15, 2020-05-15,~
## $ lat <dbl> 41.41625, 44.73126, 37.80697, 30.15410, 26.02164, 37.79103, ~
## $ lng <dbl> -122.38998, -68.82336, -122.27263, -95.40512, -80.17246, -12~
## $ continent <fct> North America, North America, North America, North America, ~
## $ country <fct> United States of America, United States of America, United S~
Table 4 shows a few descriptive statistics of the content of the dataset as well as the volume of data exchanged so far. Clearly, the total volume exchanged between hotspots and connected devices is very small, if not ridiculous. That’s about the data volume I made with my smartphone in recent years. I don’t think this metric is not a good indication of the Helium usage however. Indeed, the network is not intended to transfer huge volume of data but to transfer data on long distance and for a cheap price. Any comparison with any other data transfer technology would not make sense. Below, we will look at the second metric.
Note also that the first transaction happened on the 2020-05-15 while the first hotspot added on the network was on 2019-07-31. In other word, there has been about 14 months between the first hotspot and the first transaction. That’s probably because you need a critical hotspot mass to convince device manufacturers to work with your network.
dataTransactionsWithLocation %>%
summarise( `Date range` = paste(min(date), max(date), sep = " - "),
`Duration` = round(max(date) - min(date)),
`Block range` = paste(min(block), max(block), sep = " - "),
`Number of transactions` = n(),
`Total number of hotspots` = length(levels(hotspot)),
`Number of hotspots involved in at least one transaction` =
length(unique(hotspot)),
`Total data volume exchanhed so far` =
paste(round(sum(dataTransactionsWithLocation$bytes) / 1e+12,3), "TB")) %>% # sum and convert Byte to Terabyte)
t() %>%
kable(caption = "Summary statistics on the content of the transaction dataset.")
| Date range | 2020-05-15 - 2022-03-06 |
| Duration | 660 days |
| Block range | 333619 - 1254990 |
| Number of transactions | 74589650 |
| Total number of hotspots | 605288 |
| Number of hotspots involved in at least one transaction | 358836 |
| Total data volume exchanhed so far | 0.503 TB |
To determine how often the hotspots are used by the devices, we chose here to analyse the number of transactions. Each data transfer between a hotspot and a device corresponds to one transaction on the blockchain and one row of our dataset.
To summarise the evolution of the number of transaction, we use the cumulative sum function on the number of transaction per date and we further stratify by continent. Figure 3 is very similar to the figure above for the number of hotspot: a slow linear phase followed by an exponential and then a fast linear phases (but what is this glitch at the end?). Surprisingly, we see that despite having about 15% of the hotspots, Asia don’t seem to be so active in terms of data transfer in contrast to North America and Europe.
# count the number of transaction per continent and date and calculate a cumulative sum
nTransactionsPerDatePerContinent <- dataTransactionsWithLocation %>%
group_by(continent, date) %>%
summarise(count = n()) %>%
group_by(continent) %>%
arrange(date) %>%
mutate(cumsum = cumsum(count)) %>%
arrange(continent)
ggplot(nTransactionsPerDatePerContinent, aes(x=date, y=cumsum, fill=continent)) +
geom_area() +
labs(title = "Growth of the number of transactions between hotspots and devices",
y = "Number of transactions",
x = "Date") +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
Figure 3: Growth of the number of transactions between hotspots and devices, stratified by continent.
This is confirmed by the distribution of the total number of transactions per continent, we see that Asia represents only 3% of the total.
dataTransactionsWithLocation %>%
group_by(continent) %>%
summarise(count = n()) %>%
mutate(percentage = round(count/sum(count)*100,2)) %>%
arrange(desc(count)) %>%
kable(caption = "Table 3: Distribution of the number of transactions per continent.")
| continent | count | percentage |
|---|---|---|
| Europe | 34210149 | 45.86 |
| North America | 33717205 | 45.20 |
| Undefined | 3839051 | 5.15 |
| Asia | 2285684 | 3.06 |
| South America | 292959 | 0.39 |
| Australia | 192795 | 0.26 |
| Africa | 51806 | 0.07 |
| Antarctica | 1 | 0.00 |
We can also check where are located the top 10 most active hotspots. Note that I use here a data.table syntax. I prefer the dplyr syntax for its readability but we need here to group by hotspots (500k!) and dplyr struggles. Data.table takes only 2 seconds to summarise this, impressive. We see that the most active hotspot are located in France, US and Canada.
summaryTransactionPerHotspot <- dataTransactionsWithLocation[, .(`number of transactions` = .N),
by = c("hotspot", "country")] %>% # data.table syntax to speedup
arrange(desc(`number of transactions`))
summaryTransactionPerHotspot %>%
slice(1:10) %>%
kable(caption = "Top 10 most active hotspots.")
| hotspot | country | number of transactions |
|---|---|---|
| 11etKgw9Lb6FndJnU17pKQVtsgbPJRvzE8eHny4J5f78NFvEXUD | France | 14097 |
| 11aWe6V6HSRpMKL5zHATKscLAfuDJoc3Q3kW82BYGnmnNJnHHXj | United States of America | 12161 |
| 11QxjZpR4Xbzb6mpjGo1F9mXLzbCNgDyteqjduSqJUmTarWnyx1 | France | 12047 |
| 112TQVbGWMQDM2TVYAbkPbvSWK9LFApBWCtkLjuuKfd9BBheoMp9 | United States of America | 11692 |
| 11c4pxUfwby5rtz2PtRm4oxmndc8WAcQg5BxT7CNpU56hHqvp9h | United States of America | 11545 |
| 112kk7sLkuPybrPDE4ZPAYcAXzuPZbV3F2MH5adatdGohmRX5zJW | United States of America | 11048 |
| 112vq9i6viw7TLt5tzDm65k34Q4Lf1rPg3jwgYHd9CVxwadcNW4g | United States of America | 11042 |
| 112RMSnPo2bpJFdVoZAUxtAEYV9WuMTE6vw4PCgJeSbhuh56fG6G | United States of America | 10920 |
| 11HhXZonK1sxhu6CEuXgqBfzjv2x7L2E81BFVZQjYHuE5pmiHMa | Canada | 10603 |
| 11o9QZnsx4sivpbm72BQGgzqmBtmVt2bbap2oE8DuzLfDMeL2w5 | United States of America | 10423 |
Now we might wonder what proportion of hotspots are involved in transactions and what is the average number of transactions.
medianNumberOfTransactions <- median(summaryTransactionPerHotspot$`number of transactions`)
propWith0Transactions <- length(which(table(dataTransactionsWithLocation$hotspot) == 0)) /
length(levels(dataTransactionsWithLocation$hotspot)) * 100
The median number of transaction per hotspot (excluding hotspots which didn’t participate in any transaction) is 43 and 40.72% hotspots did not participate in any transaction so far. We cannot really say that all hotspots are useful… Yet! The network still has a lot of capacity.
As we did above, we will visualise the transactions on the the world map. We bin the data using the makeHexData function from above and overlay the map with the number of data transactions. This time, we create a longitudinal animation using the gganimate package (Figure 4). Although a direct comparison with Figure 3 is difficult since we have here an additional dimension (the color refers to the number of transactions), the message is similar. We see that transactions are mainly done in North America before mid 2020. We then start seeing a strong increase in Europe and Asia. Nothing is visible in South America and Africa.
## bin the hotspot into hexagons
# find the bounds for the complete data
xbndsPacket <- range(dataTransactionsWithLocation$lng)
ybndsPacket <- range(dataTransactionsWithLocation$lat)
nTransactionsPerDateHexbin <- dataTransactionsWithLocation %>%
mutate(date = as.Date(round_date(date, "week"))) %>% # let's decrease the resolution to ease plotting
group_by(date) %>%
group_modify(~ makeHexData(.x,
nbins = 500,
xbnds = xbndsPacket,
ybnds = ybndsPacket))
pNumberOfTransactionsAnimated <- map +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nTransactionsPerDateHexbin) +
scale_fill_distiller(palette = "Spectral",
trans = "log10") +
labs(title = "Evolution of the number of transactions",
fill = "Number of transactions") +
theme(legend.position = "bottom")
anim <- pNumberOfTransactionsAnimated +
transition_time(date) +
labs(title = "Date: {frame_time}",
subtitle = 'Frame {frame} of {nframes}')
animate(anim, nframes = length(unique(nTransactionsPerDateHexbin$date)))
Figure 4: Evolution of the number of transactions across the globe.
To add a bit of perspective, we can also turn the plot in 3D with the awesome rayshader package. Let’s focus on two countries: (1) US as it has the biggest number of hotspots and transactions and (2) Belgium, which is my home country. We re-bin the data into hexagons since we generate a static plot and not an animation. It is possible to animate this 3D plot but it takes a lot of computing time and fine tuning (see this).
Figure 5 shows the US map. We see that transactions are well distributed across the country although the peaks (note that the legend is logarithmic!) are located around big cities (New york, Los Angeles, San Francisco, Miami).
# get the US map
US <- map_data("usa")
mapUS <- ggplot() +
geom_map(
data = US, map = US,
aes(long, lat, map_id = region)
) +
scale_y_continuous(breaks=NULL) +
scale_x_continuous(breaks=NULL) +
theme(panel.background = element_rect(fill='white', colour='white'))
# filter to keep only US transactions
dataTransactionsWithLocationUS <- dataTransactionsWithLocation %>%
filter(country == "United States of America") %>%
filter(lng > -140) # there are a few hotspots far from the mainland
# find the bounds for the complete data
xbndsPacketUS <- range(dataTransactionsWithLocationUS$lng)
ybndsPacketUS <- range(dataTransactionsWithLocationUS$lat)
# bin onto hexagons
nTransactionsUS <- dataTransactionsWithLocationUS %>%
group_modify(~ makeHexData(.x,
nbins = 250,
xbnds = xbndsPacketUS,
ybnds = ybndsPacketUS))
# generate the plot
pNumberOfTransactionsUS <- mapUS +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nTransactionsUS) +
scale_fill_distiller(palette = "Spectral", trans = "log10") +
labs(title = "Distribution of the transactions in US",
fill = "Number of transactions") +
theme(legend.position = "bottom")
# add the 3D
plot_gg(pNumberOfTransactionsUS,
multicore = TRUE,
width = 8,
height= 8,
zoom = 0.7,
theta = 0,
phi = 70,
raytrace = TRUE)
rgl::rglwidget(width = 800, # this is to print the widget in the html document
height = 600)